\subsection{ RAKE }
Rapid Automatic Automatic Keyword Extraction (\textbf{RAKE} ) ~\citep{rose2010automatic} is unsupervised, language independent method for extracting keywords from individual documents. RAKE is based on the assumption that keywords contains multiple words at large but very rarely contain stop words and punctuation in it.

RAKE needs stop words list, phrase delimiters and word delimiters as input parameters. Candidate keywords are chosen based on the stop words and phrase delimiters. Co-occurences of words within the candidate words used as measure for candidate keyword being a keyword.

First, the document is split into array of words based on the word delimiters. The resultant array is splited into sequence of continuous words based on the phrase delimiters and stop word occurence.

\cbox{Example - Text Document\\}{Criteria of compatibility of a system of linear Diophantine equations, strict inequations, 
and nonstrict inequations are considered. Upper bounds for components of a minimal set 
of solutions and algorithms of construction of minimal generating sets of solutions for all 
types of systems are given. These criteria and the corresponding algorithms for 
constructing a minimal supporting set of solutions can be used in solving all the 
considered types of systems and systems of mixed types.}

\cbox{Example - Candidate Keywords\\}{Compatibility - systems - linear constraints - set - natural numbers - Criteria - 
compatibility - system - linear Diophantine equations - strict inequations - nonstrict 
inequations - Upper bounds - components - minimal set - solutions - algorithms - 
minimal generating sets - solutions - systems - criteria - corresponding algorithms - 
constructing - minimal supporting set - solving - systems - systems}

%\todo{EXAMPLE - scores}

\cbox{Example - Final Keyword Scores\\}{minimal generating sets (8.7), linear diophantine equations (8.5), minimal supporting set (7.7), minimal set (4.7), linear constraints (4.5), natural numbers (4), strict inequations (4), 
nonstrict inequations (4), upper bounds (4), corresponding algorithms (3.5), set (2),  algorithms (1.5), compatibility (1), systems (1), criteria (1), system (1), components 
(1),constructing (1), solving (1)}

%\noindent \textbf{Scoring}

After all candidate keywords are identified and graph of co-occurences is built. Score of a candidate keyword is calculated based on sum of it's member words scores.

Word scores are based on,

\begin{itemize}[nolistsep]
\item word frequency \textit{ freq(w) }
\item word degree \textit{ deg(w) }
\item ratio of word frequency to degree \textit{ freq(w) / deg(w) }
\end{itemize}

deg(w) favours a word which occurs in longer candidate keywords. words that occur in many candidates are favoured by freq(w). Words that largely part of longer candidate keywords are favoured by deg(w)/freq(w).

Since candidate keywords are generated based on stop words. No candidate keyword will have any stop words in it (e.g. Times of India) . So to include them as candidate keywords, If pair of words occur twice in the document and in the same order then it they are added to candidate set of keywords. 

RAKE's performance is evaluated against technical abstracts reported in Hulth (2003), and it achieved 33.7 \% precision and 37.2 \% recall with self generated stopwords (df > 10) which is higher than textrank's best score which is 31.2 \% precision and 37.2 \% recall. 
